Similarity measures for vocal-based drum sample retrieval using deep convolutional auto-encoders
نویسندگان
چکیده
The expressive nature of the voice provides a powerful medium for communicating sonic ideas, motivating recent research on methods for query by vocalisation. Meanwhile, deep learning methods have demonstrated state-of-the-art results for matching vocal imitations to imitated sounds, yet little is known about how well learned features represent the perceptual similarity between vocalisations and queried sounds. In this paper, we address this question using similarity ratings between vocal imitations and imitated drum sounds. We use a linear mixed effect regression model to show how features learned by convolutional auto-encoders (CAEs) perform as predictors for perceptual similarity between sounds. Our experiments show that CAEs outperform three baseline feature sets (spectrogram-based representations, MFCCs, and temporal features) at predicting the subjective similarity ratings. We also investigate how the size and shape of the encoded layer effects the predictive power of the learned features. The results show that preservation of temporal information is more important than spectral resolution for this application.
منابع مشابه
Effective Multi-Modal Retrieval based on Stacked Auto-Encoders
Multi-modal retrieval is emerging as a new search paradigm that enables seamless information retrieval from various types of media. For example, users can simply snap a movie poster to search relevant reviews and trailers. To solve the problem, a set of mapping functions are learned to project high-dimensional features extracted from data of different media types into a common lowdimensional sp...
متن کاملA Radon-based Convolutional Neural Network for Medical Image Retrieval
Image classification and retrieval systems have gained more attention because of easier access to high-tech medical imaging. However, the lack of availability of large-scaled balanced labelled data in medicine is still a challenge. Simplicity, practicality, efficiency, and effectiveness are the main targets in medical domain. To achieve these goals, Radon transformation, which is a well-known t...
متن کاملA Particle Swarm Optimization-based Flexible Convolutional Auto-Encoder for Image Classification
Convolutional auto-encoders have shown their remarkable performance in stacking to deep convolutional neural networks for classifying image data during past several years. However, they are unable to construct the state-of-the-art convolutional neural networks due to their intrinsic architectures. In this regard, we propose a flexible convolutional auto-encoder by eliminating the constraints on...
متن کاملTowards a comprehensive dataset of vocal imitations of drum sounds
The voice is a rich and powerful means of expressing acoustic concepts such as musical sounds. Recent research on vocal imitations has demonstrated the viability of using the voice to search for sounds, using query by vocalisation. Here we present the methods used to develop a dataset for evaluating the performance of query by vocalisation systems for drum sounds. The dataset consists of imitat...
متن کاملA Pitfall of Unsupervised Pre-Training
In this paper we thoroughly investigate the quality of features produced by deep neural network architectures obtained by stacking and convolving Auto-Encoders. In particular, we are interested into the relation of their reconstruction score with their performance on document layout analysis. When using Auto-Encoders, intuitively one could assume that features which are good for reconstruction ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1802.05178 شماره
صفحات -
تاریخ انتشار 2018